Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

The impact of running headers and footers on proximity searching

Identifieur interne : 001610 ( Main/Exploration ); précédent : 001609; suivant : 001611

The impact of running headers and footers on proximity searching

Auteurs : Kazem Taghva [États-Unis] ; Julie Borsack [États-Unis] ; Tom Nartker [États-Unis] ; Jeffrey Coombs [États-Unis] ; Ron Young [États-Unis]

Source :

RBID : Pascal:04-0468840

Descripteurs français

English descriptors

Abstract

Hundreds of experiments over the last decade on the retrieval of OCR documents performed by the Information Science Research Institute have shown that OCR errors do not significantly affect retrievability. We extend those results to show that in the case of proximity searching, the removal of running headers and footers from OCR text will not improve retrievability for such searches.


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en" level="a">The impact of running headers and footers on proximity searching</title>
<author>
<name sortKey="Taghva, Kazem" sort="Taghva, Kazem" uniqKey="Taghva K" first="Kazem" last="Taghva">Kazem Taghva</name>
<affiliation wicri:level="2">
<inist:fA14 i1="01">
<s1>Information Science Research Institute, University of Nevada, Las Vegas</s1>
<s2>Las Vegas, NV 89154-4021</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<placeName>
<region type="state">Nevada</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Borsack, Julie" sort="Borsack, Julie" uniqKey="Borsack J" first="Julie" last="Borsack">Julie Borsack</name>
<affiliation wicri:level="2">
<inist:fA14 i1="01">
<s1>Information Science Research Institute, University of Nevada, Las Vegas</s1>
<s2>Las Vegas, NV 89154-4021</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<placeName>
<region type="state">Nevada</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Nartker, Tom" sort="Nartker, Tom" uniqKey="Nartker T" first="Tom" last="Nartker">Tom Nartker</name>
<affiliation wicri:level="2">
<inist:fA14 i1="01">
<s1>Information Science Research Institute, University of Nevada, Las Vegas</s1>
<s2>Las Vegas, NV 89154-4021</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<placeName>
<region type="state">Nevada</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Coombs, Jeffrey" sort="Coombs, Jeffrey" uniqKey="Coombs J" first="Jeffrey" last="Coombs">Jeffrey Coombs</name>
<affiliation wicri:level="2">
<inist:fA14 i1="01">
<s1>Information Science Research Institute, University of Nevada, Las Vegas</s1>
<s2>Las Vegas, NV 89154-4021</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<placeName>
<region type="state">Nevada</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Young, Ron" sort="Young, Ron" uniqKey="Young R" first="Ron" last="Young">Ron Young</name>
<affiliation wicri:level="2">
<inist:fA14 i1="01">
<s1>Information Science Research Institute, University of Nevada, Las Vegas</s1>
<s2>Las Vegas, NV 89154-4021</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<placeName>
<region type="state">Nevada</region>
</placeName>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">INIST</idno>
<idno type="inist">04-0468840</idno>
<date when="2004">2004</date>
<idno type="stanalyst">PASCAL 04-0468840 INIST</idno>
<idno type="RBID">Pascal:04-0468840</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000533</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000257</idno>
<idno type="wicri:Area/PascalFrancis/Checkpoint">000451</idno>
<idno type="wicri:doubleKey">1017-2653:2004:Taghva K:the:impact:of</idno>
<idno type="wicri:Area/Main/Merge">001662</idno>
<idno type="wicri:Area/Main/Curation">001610</idno>
<idno type="wicri:Area/Main/Exploration">001610</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a">The impact of running headers and footers on proximity searching</title>
<author>
<name sortKey="Taghva, Kazem" sort="Taghva, Kazem" uniqKey="Taghva K" first="Kazem" last="Taghva">Kazem Taghva</name>
<affiliation wicri:level="2">
<inist:fA14 i1="01">
<s1>Information Science Research Institute, University of Nevada, Las Vegas</s1>
<s2>Las Vegas, NV 89154-4021</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<placeName>
<region type="state">Nevada</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Borsack, Julie" sort="Borsack, Julie" uniqKey="Borsack J" first="Julie" last="Borsack">Julie Borsack</name>
<affiliation wicri:level="2">
<inist:fA14 i1="01">
<s1>Information Science Research Institute, University of Nevada, Las Vegas</s1>
<s2>Las Vegas, NV 89154-4021</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<placeName>
<region type="state">Nevada</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Nartker, Tom" sort="Nartker, Tom" uniqKey="Nartker T" first="Tom" last="Nartker">Tom Nartker</name>
<affiliation wicri:level="2">
<inist:fA14 i1="01">
<s1>Information Science Research Institute, University of Nevada, Las Vegas</s1>
<s2>Las Vegas, NV 89154-4021</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<placeName>
<region type="state">Nevada</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Coombs, Jeffrey" sort="Coombs, Jeffrey" uniqKey="Coombs J" first="Jeffrey" last="Coombs">Jeffrey Coombs</name>
<affiliation wicri:level="2">
<inist:fA14 i1="01">
<s1>Information Science Research Institute, University of Nevada, Las Vegas</s1>
<s2>Las Vegas, NV 89154-4021</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<placeName>
<region type="state">Nevada</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Young, Ron" sort="Young, Ron" uniqKey="Young R" first="Ron" last="Young">Ron Young</name>
<affiliation wicri:level="2">
<inist:fA14 i1="01">
<s1>Information Science Research Institute, University of Nevada, Las Vegas</s1>
<s2>Las Vegas, NV 89154-4021</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<placeName>
<region type="state">Nevada</region>
</placeName>
</affiliation>
</author>
</analytic>
<series>
<title level="j" type="main">SPIE proceedings series</title>
<idno type="ISSN">1017-2653</idno>
<imprint>
<date when="2004">2004</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt>
<title level="j" type="main">SPIE proceedings series</title>
<idno type="ISSN">1017-2653</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Digitized document</term>
<term>Digitizing</term>
<term>Information retrieval</term>
<term>Proximity</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr">
<term>Recherche information</term>
<term>Numérisation</term>
<term>Proximité</term>
<term>Document numérisé</term>
</keywords>
<keywords scheme="Wicri" type="topic" xml:lang="fr">
<term>Numérisation</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Hundreds of experiments over the last decade on the retrieval of OCR documents performed by the Information Science Research Institute have shown that OCR errors do not significantly affect retrievability. We extend those results to show that in the case of proximity searching, the removal of running headers and footers from OCR text will not improve retrievability for such searches.</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>États-Unis</li>
</country>
<region>
<li>Nevada</li>
</region>
</list>
<tree>
<country name="États-Unis">
<region name="Nevada">
<name sortKey="Taghva, Kazem" sort="Taghva, Kazem" uniqKey="Taghva K" first="Kazem" last="Taghva">Kazem Taghva</name>
</region>
<name sortKey="Borsack, Julie" sort="Borsack, Julie" uniqKey="Borsack J" first="Julie" last="Borsack">Julie Borsack</name>
<name sortKey="Coombs, Jeffrey" sort="Coombs, Jeffrey" uniqKey="Coombs J" first="Jeffrey" last="Coombs">Jeffrey Coombs</name>
<name sortKey="Nartker, Tom" sort="Nartker, Tom" uniqKey="Nartker T" first="Tom" last="Nartker">Tom Nartker</name>
<name sortKey="Young, Ron" sort="Young, Ron" uniqKey="Young R" first="Ron" last="Young">Ron Young</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001610 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 001610 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     Pascal:04-0468840
   |texte=   The impact of running headers and footers on proximity searching
}}

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024